KMI, The Open University at NTCIR-9 CrossLink: Cross-Lingual Link Discovery in Wikipedia Using Explicit Semantic Analysis

نویسندگان

  • Petr Knoth
  • Lukás Zilka
  • Zdenek Zdráhal
چکیده

This paper describes the methods used in the submission of Knowledge Media institute (KMI), The Open University to the NTCIR-9 Cross-Lingual Link Discovery (CLLD) task entitled CrossLink. KMI submitted four runs for link discovery from English to Chinese; however, the developed methods, which utilise Explicit Semantic Analysis (ESA), are applicable also to other language combinations. Three of the runs are based on exploiting the existing cross-lingual mapping between different versions of Wikipedia articles. In the fourth run, we assume information about the mapping is not available. Our methods achieved encouraging results and we describe in detail how their performance can be further improved. Finally, we discuss two important issues in link discovery: the evaluation methodology and the applicability of the developed methods across different textual collections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Cross-lingual Link Discovery in Wikipedia

At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-l...

متن کامل

NTCIR-10 CrossLink-2 Task: A Link Mining Strategy

At NTCIR-10 we participated in the cross-lingual link discovery (CrossLink-2) task. In this paper we describe our systems for discovering cross-lingual links between the Chinese, Japanese, and Korean (CJK) Wikipedia and the English Wikipedia. The evaluation results show that our implementation of the crosslingual linking method achieved promising results.

متن کامل

NTHU at NTCIR-10 CrossLink-2: An Approach toward Semantic Features

This paper describes the approaches of NTHU in the NTCIR-10 Cross-Lingual Link Discovery task, also named CrossLink-2. In this task, we aim to discover valuable anchors in Chinese, Japanese or Korean (CJK) articles and to link these anchors to related English Wikipedia pages. To achieve the objective, we do not only depend on Wikipedia’s distinguishing features (e.g. anchor links information an...

متن کامل

An evaluation framework for cross-lingual link discovery

Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case wit...

متن کامل

Osaka Kyoiku University at NTCIR-10 CrossLink-2: Link Filtering by Title Tag of Corpus as a Dictionary

Our group (OKSAT) submitted two types of runs named SMP and REF for every subtasks of NTCIR-10 Cross-lingual Link Discovery (CLLD). Our method uses titles in Wikipedia pages (corpus) of source language as a entries of a dictionary, so no external dictionary is required. For SMP, we aimed to discover cross-lingual links of actual Wikipedia, in other words it targets Wikipedia ground truth. For R...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011